Skip to content

karen: support-first prompt + pre-purchase scenarios#317

Closed
deepmasq wants to merge 8 commits intomainfrom
feat/karen-pre-purchase-quality
Closed

karen: support-first prompt + pre-purchase scenarios#317
deepmasq wants to merge 8 commits intomainfrom
feat/karen-pre-purchase-quality

Conversation

@deepmasq
Copy link
Copy Markdown
Contributor

@deepmasq deepmasq commented Apr 16, 2026

Summary

Fibery #2403 — Karen Pre-purchase answers and recommendations quality.

  • Restructure very_limited expert prompt: support-first, sales-assist only on buying intent
  • Extract C.L.O.S.E.R. + BANT to skills/sales-closer/SKILL.md (loaded on demand via flexus_fetch_skill)
  • Add 3 benchmark scenario YAMLs for pre-purchase conversations
  • Fallback in prompt if skill unavailable

Tournament Result

Produced via /tournament (N=2). Candidate A (Opus, conservative reorder) scored 7.65, Candidate B (Sonnet, skill extraction) scored 7.05. Judge recommended SYNTHESIZE: A's prompt structure + scenarios, B's skill extraction.

What Source
Prompt structure (support/recommendations/sales-assist sections) Candidate A
3 scenario YAMLs Candidate A
skills/sales-closer/SKILL.md Candidate B
Skill-loading trigger + fallback line Synthesis

Files

File Change
karen_prompts.py Rewrite very_limited section (+29/-19)
skills/sales-closer/SKILL.md New (24 lines) — C.L.O.S.E.R. + BANT on demand
very_limited__pre_purchase_plan_comparison.yaml New scenario — team comparing Pro vs Enterprise
very_limited__pre_purchase_just_browsing.yaml New scenario — pure support, no buying intent
very_limited__pre_purchase_browsing_to_intent.yaml New scenario — support → buying intent transition

Benchmark Results (staging, v1.2.231)

Scores by model (actual_rating / 10)

Scenario grok-4-1-fast-reasoning claude-sonnet-4-6 gpt-5.4
plan_comparison 8 -- --
saas_cs_platform_short (regression) 8 -- --
just_browsing 6 5 8
browsing_to_intent 4 8 8

Key findings

  • gpt-5.4 passes all scenarios at 8/10 — best tool-following compliance (uses flexus_vector_search correctly, no fabrication, no unnecessary follow-ups)
  • grok fabricates during sales transitions (hallucinated pricing, invented links) — known model behavior issue
  • claude-sonnet inconsistent — passes browsing_to_intent (8) but fails just_browsing (5) on tool selection
  • No regression on existing sales scenario (8/10 on grok)
  • Prompt iteration applied: v1 → v2 tightened "use flexus_vector_search not product_catalog", removed follow-up loophole, added anti-fabrication guardrail

Model recommendation

Karen's very_limited expert should use gpt-5.4 for customer-facing conversations. It follows support-first instructions and tool constraints most reliably. grok-4-1-fast-reasoning remains fine for the default (admin) expert where sales compliance is less critical.

Test plan

  • Run 3 new scenarios against staging (grok): plan_comparison 8, just_browsing 6, browsing_to_intent 4
  • Run regression scenario (grok): saas_cs_platform_short 8 — no regression
  • Re-run failing scenarios with claude-sonnet: just_browsing 5, browsing_to_intent 8
  • Re-run failing scenarios with gpt-5.4: just_browsing 8, browsing_to_intent 8
  • Switch very_limited expert model to gpt-5.4 (separate change)

🤖 Generated with Claude Code

Art Koval and others added 3 commits April 16, 2026 15:59
…e scenarios

tournament synthesis: A's prompt structure (support-first, explicit intent signals,
strong anti-interrogation) + B's skill extraction (sales-closer loaded on demand
via flexus_fetch_skill) + fallback if skill unavailable.

3 new benchmark scenarios: plan comparison, just browsing, browsing-to-intent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
backslash-quote inside single-quoted yaml string broke parser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rication

fixes 2 failing scenarios:
- just_browsing: karen used product_catalog instead of flexus_vector_search,
  asked follow-ups in pure support mode
- browsing_to_intent: fabricated links not grounded in KB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@deepmasq deepmasq requested a review from humbertoyusta April 17, 2026 10:15
@deepmasq deepmasq marked this pull request as ready for review April 17, 2026 10:15
Art Koval and others added 3 commits April 17, 2026 13:17
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…io, clean judge_instructions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Humberto feedback: don't tell the model "NEVER do X" when X was
never instructed. Removed "NEVER interrogate", "Don't upsell",
"NEVER push for a decision / manufacture urgency".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Ground every recommendation in flexus_vector_search() results. No invented features.
- If they say "I'll think about it" or "let me check with my team" — that's a valid outcome. Offer to help later, resolve.

## Sales-Assist (Only on Buying Intent)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're splitting, and it's support bot mostly, do we really need to do the separation of support mode default and sales assist?

the only sales-only part of prompt is

When you detect buying intent: listen 70% talk 30%, clarify their problem, paint the outcome not features, handle objections honestly, offer a human when stuck.

which, I think, it does not justify the if this then support, if this then sell, just telling to answer should be fine, in only one mode, which is mostly support

Art Koval and others added 2 commits April 21, 2026 11:58
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@deepmasq
Copy link
Copy Markdown
Contributor Author

Superseded by #322 (sales extraction from Karen) which removed the Sales-Assist section, BANT, and C.L.O.S.E.R. entirely. The pre-purchase scenarios from this PR should be re-added separately if needed.

@deepmasq deepmasq closed this Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants